a31ba94b933be2190a4c90b611056a1897730cfe
front devel test 4 base
- challenge
- "He Said She Said" classification challenge (2nd edition)
- submitter
- devel
- submitted
- 2023-11-03 11:50:59.429639 UTC
- file basename
- out
dev-1 / 1d94f9c9c51d2576863e924ea06b116accd54fbd
Metric | Score |
---|---|
Likelihood | 0.00000 |
Accuracy | 0.52555 |
Likelihood | Accuracy | |
---|---|---|
+H | 0.00000 | 0.00000 |
+C | 0.00000 | 0.00000 |
-C | 0.00000 | 0.52555 |
worst items
note: the gold standard is taken from the submission itself, not from the challenge data!# | input | expected output | actual output | dev-0 Likelihood +C |
---|---|---|---|---|
1 | zakończyłem jakiś czas temu. Potem dość długie lata śpiewałem w chórze for-humans contaminated | 0 | 1 | 0.00000 |